Skip to content

feat(hotpatch): remove backfill on old chunks positioning#1514

Merged
MODSetter merged 2 commits into
mainfrom
dev
Jun 18, 2026
Merged

feat(hotpatch): remove backfill on old chunks positioning#1514
MODSetter merged 2 commits into
mainfrom
dev

Conversation

@MODSetter

@MODSetter MODSetter commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Description

Motivation and Context

FIX #

Screenshots

API Changes

  • This PR includes API changes

Change Type

  • Bug fix
  • New feature
  • Performance improvement
  • Refactoring
  • Documentation
  • Dependency/Build system
  • Breaking change
  • Other (specify):

Testing Performed

  • Tested locally
  • Manual/QA verification

Checklist

  • Follows project coding standards and conventions
  • Documentation updated as needed
  • Dependencies updated as needed
  • No lint/build errors or new warnings
  • All relevant tests are passing

High-level PR Summary

This PR removes an expensive backfill operation from a database migration that was updating the position column on historical chunks. The original migration attempted to backfill all existing rows with correct position values, but on large tables with heavy indexing (notably multi-hundred-GB HNSW embedding indexes), this bulk UPDATE caused non-HOT updates that rewrite every secondary index per row, turning what should be a quick migration into a multi-day operation. The new approach intentionally leaves existing rows with position = 0 (falling back to Chunk.id ordering, matching pre-feature behavior) while new and re-indexed documents get correct positions from the application code. Additionally, the position column index is removed from the model since ordering reads are document-scoped and already covered by ix_chunks_document_id.

⏱️ Estimated Review Time: 30-90 minutes

💡 Review Order Suggestion
Order File Path
1 surfsense_backend/app/db.py
2 surfsense_backend/alembic/versions/165_add_chunk_position.py

Need help? Join our Discord

Summary by CodeRabbit

  • Chores
    • Optimized database indexing strategy for chunk storage to improve performance and reduce unnecessary overhead on large data tables.
    • Simplified database migration process by removing complex backfill operations; new data will be properly ordered while existing records maintain current ordering.

- Introduced a new `position` column in the `chunks` table to maintain explicit document order during re-indexing.
- Updated migration to add the column without backfilling historical rows to avoid performance issues on large tables.
- Adjusted the `Chunk` model to reflect the new column without indexing, as ordering reads are document-scoped.
@vercel

vercel Bot commented Jun 18, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
surf-sense-frontend Ready Ready Preview, Comment Jun 18, 2026 4:19pm

Request Review

@MODSetter MODSetter merged commit b233f15 into main Jun 18, 2026
8 of 11 checks passed
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cda80b49-d0f2-45e4-814a-d40bd36ed4d0

📥 Commits

Reviewing files that changed from the base of the PR and between c941907 and 0c50161.

📒 Files selected for processing (2)
  • surfsense_backend/alembic/versions/165_add_chunk_position.py
  • surfsense_backend/app/db.py

📝 Walkthrough

Walkthrough

Migration 165 is simplified to add chunks.position with a constant NOT NULL default of 0 and clean up a leftover scratch table, dropping all batched backfill logic, index creation, and helper functions. The Chunk model removes index=True from the position column, with comments documenting the intentional omission.

Changes

chunks.position column and migration simplification

Layer / File(s) Summary
Chunk.position model definition without index
surfsense_backend/app/db.py
Chunk.position drops index=True; added comments explain document-scoped ordering and why a standalone position index is omitted.
Migration 165: simplified upgrade/downgrade
surfsense_backend/alembic/versions/165_add_chunk_position.py
Docstring updated to state no historical backfill occurs. upgrade() only adds the position column with default 0 and drops the scratch table. downgrade() only drops the scratch table and removes the column. All backfill helpers, index creation, constants, and the logger are removed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 No more backfill in the night,
No batch loops running left and right.
Position zero, clean and neat,
A scratch table swept from head to feet.
Less is more — the bunny's delight! 🌿

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant